X2.2 and X9.3 flares from #AR12673 today. They were both gorgeous flares, but there is huge difference in terms of their possible geo-impact pic.twitter.com/5caTxLZu9k
— Halo CME (@halocme) September 6, 2017
9/7/2017
X2.2 and X9.3 flares from #AR12673 today. They were both gorgeous flares, but there is huge difference in terms of their possible geo-impact pic.twitter.com/5caTxLZu9k
— Halo CME (@halocme) September 6, 2017
Customer Impacts
NASA Frontier Development Lab is an AI R&D accelerator that tackles knowledge gaps useful to the space program. The program is an intense 8-week concentrated study on topics not only important to NASA, but also to humanity’s future.
Also, 16 of these for 2 months
From March 2010:
https://www.ngdc.noaa.gov/stp/satellite/goes/dataaccess.html
https://satdat.ngdc.noaa.gov/sem/goes/data/new_full/2010/04/goes15/
flux <- read.csv("data/Flux_2010_2015.csv")
head(flux)
## Time Flux Count ## 1 2010-01-01 00:02:00 5.403997e-08 58 ## 2 2010-01-01 00:04:00 5.346478e-08 58 ## 3 2010-01-01 00:06:00 5.423044e-08 59 ## 4 2010-01-01 00:08:00 5.692712e-08 59 ## 5 2010-01-01 00:10:00 5.496905e-08 58 ## 6 2010-01-01 00:12:00 6.079832e-08 59
Need a script like the following one to pull down the CSVs.
Take the max over some interval.
The Solar Dynamics Observatory (SDO) is a NASA mission which has been observing the Sun since 2010. Launched on February 11, 2010, the observatory is part of the Living With a Star (LWS) program.
From PBs to GBs…
create.file.list <- function(which.months = 1:6){
base.url <- "http://jsoc.stanford.edu/data/aia/synoptic/2014/"
months <- sprintf("%02.f", 1:12)[which.months]
max.days <- c(31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31)[which.months]
hours <- sprintf("%02.f", 0:23)
minutes <- sprintf("%02.f", c(00, 12, 24, 36, 48))
wl <- c("0094", "0131", "0171", "0193", "0211", "0304", "0335", "1600")
target.file.list <- target.urls <- rep("", sum(max.days) * 5 * 24 * 8)
ind <- 1
for(i in 1:length(months)){
for(j in 1:(max.days[i])){
for(k in 1:length(hours)){
for(l in 1:length(minutes)){
for(m in 1:length(wl)){
aia.url <- paste0(base.url, months[i], "/",
sprintf("%02.f", j), "/H", hours[k], "00/")
aia.file <- paste0("AIA2014", months[i],
sprintf("%02.f", j), "_", hours[k], minutes[l], "_", wl[m], ".fits")
target.file.list[ind] <- aia.file
target.urls[ind] <- paste0(aia.url, aia.file)
ind <- ind + 1
}
}
}
}
}
target.files <- list(target.file.list, target.urls)
}
download.aia.2014 <- function(target.dir = "data/AIA2014/",
target.file.list, target.urls){
# Check existing files
lf <- list.files(target.dir)
done.list <- which(target.file.list %in% lf)
if(length(done.list) > 0){
todo.list <- c(1:length(target.file.list))[-done.list]
} else {
todo.list <- c(1:length(target.file.list))
}
# Download only missing data
for(i in 1:length(todo.list)){
try(download.file(url = target.urls[todo.list[i]],
destfile = paste0(target.dir,
target.file.list[todo.list[i]]),
method = "libcurl"),
silent = TRUE)
}
}
We'll only look at the first two samples of data; i.e. the first 16 fits files.
temp.file.list <- create.file.list(which.months = 1)
download.aia.2014(target.dir = "data/AIA2014/",
target.file.list = temp.file.list[[1]][1:16],
target.urls = temp.file.list[[2]][1:16])
aia.index.allchannels <- function(target.dir = "data/",
target.file.dir = "data/AIA2014/",
temp.date = Sys.Date()){
lf <- list.files(target.file.dir)
time.stamp <- substr(lf, 1, 16)
wave.length <- substr(lf, 18, 21)
time.wave.tbl <- table(time.stamp, wave.length)
time.wave.tbl8 <- time.wave.tbl[, 1:8]
num.channels <- apply(time.wave.tbl8, 1, sum)
all.channels <- num.channels == 8
sum(all.channels)
all.channels.times <- rownames(time.wave.tbl8)[all.channels]
year <- as.numeric(substr(all.channels.times, 4, 7))
month <- as.numeric(substr(all.channels.times, 8, 9))
day <-as.numeric(substr(all.channels.times, 10, 11))
hour <- as.numeric(substr(all.channels.times, 13, 14))
minute <- as.numeric(substr(all.channels.times, 15, 16))
AIAindex.allchannels <- cbind(year, month, day, hour, minute)
AIAindex.allchannels.POSIX <- as.POSIXct(paste(
paste(AIAindex.allchannels[, 1],
sprintf("%02.f", AIAindex.allchannels[, 2]),
sprintf("%02.f", AIAindex.allchannels[, 3]), sep = "-"),
paste(sprintf("%02.f", AIAindex.allchannels[, 4]),
sprintf("%02.f", AIAindex.allchannels[, 5]), "00", sep = ":"),
sep = " "), tz = "UTC")
write.csv(AIAindex.allchannels.POSIX,
paste0(target.dir, "AIA_index_allChannels_", temp.date,".csv"),
row.names = F)
}
library(doParallel)
library(foreach)
library(feather)
library(reticulate)
registerDoParallel(cores=detectCores())
astropy <- import("astropy")
aia.index.allchannels(target.dir = "data/", target.file.dir = "data/AIA2014/",
temp.date = "2017-09-07")
lf <- list.files("data/AIA2014/")
target.dir = "data/"
temp.date = "2017-09-07"
all.channel.index.POSIX <- as.POSIXct(read.csv(paste0(target.dir,
"AIA_index_allChannels_",
temp.date,".csv"),
stringsAsFactors = FALSE)[[1]],
tz = "UTC", format = "%Y-%m-%d %H:%M:%S")
all.channel.index <- outer(format(all.channel.index.POSIX, "%Y%m%d_%H%M"),
c("_0094", "_0131", "_0171", "_0193",
"_0211", "_0304", "_0335", "_1600"),
paste, sep = "")
foreach(i = 1:nrow(all.channel.index)) %dopar% {
big.mat <- matrix(NA, nrow = 1024*1024, ncol = 8)
for(j in 1:8){
temp <- astropy$io$fits$open(paste0("data/AIA2014/AIA", all.channel.index[i, j], ".fits"))
temp$verify("fix")
exp.time <- as.numeric(substring(strsplit(as.character(temp[[1]]$header),
"EXPTIME")[[1]][2], 4, 12))
temp.mat <- temp[[1]]$data
temp.mat[temp.mat <= 0] <- 0
temp.mat <- temp.mat + 1
big.mat[, j] <- c(t(temp.mat / exp.time))
}
write_feather(as.data.frame(big.mat), paste0("data/FeatherAIA2014/AIA",
format(all.channel.index.POSIX[i], "%Y%m%d_%H%M"),
".feather"))
}
lf.train <- list.files("data/FeatherAIA2014/")
temp.mat <- read_feather(paste0("data/FeatherAIA2014/", lf.train[1]))
par(mfrow=c(2,4), mar=c(1,1,1,1), oma=c(0.5,0.5,0.5,0.5))
hist(as.matrix(temp.mat)[, 1])
par(mfrow=c(2,4), mar=c(1,1,1,1), oma=c(1,1,0.5,0.5))
for(i in 1:8){
hist(log(as.matrix(temp.mat))[, i], main = "")
}
par(mfrow=c(2,4), mar=c(1,1,1,1), oma=c(0.5,0.5,0.5,0.5)) temp.mat2 <- scale(temp.mat) image(matrix(as.matrix(temp.mat2)[, 1], nrow = 1024), zlim = c(-2, 2), xaxt = "n", yaxt = "n")
temp.mat2 <- scale(log(temp.mat))
image(matrix(as.matrix(temp.mat2)[, 1], nrow = 1024), zlim = c(-2, 2),
xaxt = "n", yaxt = "n")
pca.mod <- prcomp(temp.mat2)
temp.mat3 <- scale(predict(pca.mod, temp.mat2))
image(matrix(as.matrix(temp.mat3)[, 1], nrow = 1024), zlim = c(-2, 2),
xaxt = "n", yaxt = "n")
Generating RGB images using the imager libary
library(imager)
prep.img <- function(vec, expo = 1){
mat <- matrix(vec, nrow = 1024)
temp.mat <- mat - min(mat)
temp.mat <- (temp.mat / max(temp.mat))[, 1024:1]
temp.mat <- temp.mat ^ expo
c(temp.mat)
}
#expo - adjusts the shape of distribution -> adjusts RGB coloring
col <- cbind(r <- prep.img(temp.mat3[, 1], .5),
g <- prep.img(temp.mat3[, 2], .5),
b <- prep.img(temp.mat3[, 3], 3))
dim(col) <- c(1024, 1024, 1, 3)
plot(cimg(col), axes = F)
Generating RGB images using the imager libary
mean.log <- foreach(i = 1:length(lf.train), .combine = rbind) %dopar% {
file.name <- paste0("data/FeatherAIA2014/AIA",
format(all.channel.index.POSIX[i], "%Y%m%d_%H%M"),
".feather")
temp.mat <- read_feather(file.name)
apply(log(temp.mat), 2, mean)
}
write.csv(mean.log, "data/MeanPostLog.csv", row.names = FALSE)
mean.vec <- apply(mean.log, 2, mean)
sd.log <- foreach(i = 1:length(lf.train), .combine = rbind) %dopar% {
file.name <- paste0("data/FeatherAIA2014/AIA",
format(all.channel.index.POSIX[i], "%Y%m%d_%H%M"),
".feather")
temp.mat <- read_feather(file.name)
apply(t(t(temp.mat) - mean.vec), 2, sd)
}
write.csv(sd.log, "data/SdPostLog.csv", row.names = FALSE)
sd.vec <- sqrt(apply(sd.log ^ 2, 2, mean))
lf <- list.files("data/FeatherAIA2014/")
foreach(i = 1:length(lf)) %dopar% {
temp.mat <- as.matrix(read_feather(paste0("data/FeatherAIA2014/", lf[i])))
# Take all 8 dimensions and flatten out.
temp.mat <- round(c(t((t(temp.mat) - mean.vec) / sd.vec)[, 1:8]), 2)
write.table(matrix(temp.mat, nrow = 1),
paste0("data/CSVAIA2014/", substring(lf[i], 1, 16), ".csv"),
sep=',', row.names=FALSE, col.names=FALSE)
}
lf <- list.files("data/CSVAIA2014/")
cat.str <- paste0("cat ", paste(lf, collapse = " "),
" > data/AIA2014_Big.csv")
cat.str
system(cat.str)
## [1] "cat AIA20140101_0000.csv AIA20140101_0012.csv > data/AIA2014_Big.csv"
is just a rebranding of neural networks.
Geoff Hinton(?), (paraphrased from memory)
Deep Learning Summer School, UCLA, 2012
“There has been a great deal of hype surrounding neural networks, making them seem magical and mysterious…”
I've talked enough about data, but clearly you need lots of storage.